BBS Seminar, 1 November 2019, Basel

Probabilistic DAGs - Bayesian networks

Compact representation of multivariate probability distributions

Probabilistic graphical models for a set of variables \(\{ X_1, X_2, \ldots, X_n \}\) characterized by

  • a graphical structure, directed and acyclic, whose nodes are the variables
  • a probability model for each node describing the relationship with its parents
  • edges encode conditional independencies (any variable is conditionally independent of its non-descendant given its parents)

$P(X_1)P(X_2)P(X_3 \vert X_1, X_2)P(X_4 \vert X_2, X_3)$
e.g. $X_4 \perp\!\!\!\perp X_1 \vert (X_2, X_3)$

\(\{ X_1, X_2, \ldots, X_n \} \thicksim P(X_1, X_2, \ldots, X_n) = \prod_{i=1}^n P(X_i \vert {\textbf{Pa}_i})\)

Causal DAGs and Markov condition

Causal interpretation

Graphical representation of structural causal models

  • qualitative
  • directed edges imply direct causes (e.g. \(X_2\) is a direct cause of \(X_4\))
  • directed paths imply potential causes (e.g. \(X_1\) is a potential cause of \(X_4\))



All common causes, even if unmeasured, of any pair of variables on the graph are themselves on the graph
Causal Markov condition: given a causal DAG representation of a system, it also represents its conditional independence (CI) properties

Intervention effects from (a known) causal DAGs


Pearl do calculus

Average causal effect: \(P(Y=1 \vert do(X=1)) - P(Y=1 \vert do(X=0))\) [causal estimand]

Causal effect: \(P(Y=y \vert do(X=x)) = P_m(Y=y \vert X=x)\) [conditional probability in manipulated model]

Adjustment formula: \(P(Y=y \vert do(X=x)) = \sum_{z} P(Y=y \vert X=x, Z=z) P(Z=z)\)
[only in terms of preintervention probabilities]

Intervention effects in terms of propensity scores


Given a causal DAG


More generally: \(P(Y=y \vert do(X=x)) = \sum_{z} P(Y=y \vert X=x, {\textbf Pa}(X)=z) P({\textbf Pa}(X)=z)\)

\(PA(X)\): parents of X in the DAG

\[ P(Y=y \vert do(X=x)) = \sum_z \frac{P(Y=y, X=x, {\textbf Pa}(X)=z)}{\underbrace{P(X=x \vert {\textbf Pa}(X) = z)}_{\textrm{Propensity score} }}\]

Reweighting samples \(\Rightarrow\) fictitious population from post-intervention distribution

Given a DAG, graphical criteria (e.g. back and front door) inform identifiability of causal effects

Inference in probabilistic graphical models

Two main tasks

  • Parameter estimation (for a given probabilistic model of a node conditional on its parents)
  • Structure learning (identify the connections, the more challenging task)

\[P(X_i \vert {\textbf Pa}_i) = ?\]

\[X_i \ ? \ X_j\]

Structure learning approaches

Constraint-based methods

  • PC (Peter and Clark) algorithm: reverse-engineering of the CIs of the ditribution

Score and search algorithms

  • Scoring function typically derived from a Bayesian approach \[P(G \vert D) \propto P(D \vert G) P(G) \ \ \ \ \ \ \ \ \textrm{Likelihood $\times$ Prior}\]

  • e.g. Greedy search, hill climbing, dynamic programming, ILP

    MCMC: posterior sampling, with recent developments with partition MCMC (Kuipers and Moffa, JASA 2017)

Hybrid methods

Markov equivalence

$$Y \perp \!\!\! \perp X \vert Z \equiv X \perp \!\!\! \perp Y \vert Z$$ $$ Y \perp \!\!\! \perp X $$

Even from perfect data \(\Rightarrow\) learning up to an equivalence class




CPDAG (Completed Partially DAG)

Causal discovery of DAGs - Some assumptions

Causal representation: There exists some DAG \(G\) that is a causal DAG representation of the system.

Causal Markov condition: The identical DAG \(G\) also represents (by means of the Markov condition) the probabilistic conditional independence properties of the system.

Causal faithfulness: The causal DAG \(G\) is a probabilistically faithful representation of the system

  • in plain English: all and only the independencies of the probability distribution are encoded in the graph
  • beware of poligamy: the same set of conditional independence relationships can be described by different DAGs, so the same distribution may be faithful to many DAGs

Causal sufficiency: No unmeasured confounders

A. Philip Dawid, 2009, Beware of the DAG!

A case study in Psychosis - Medical background

Psychosis: medical equivalent of the lay idea of madness

Schizophrenia: best known psychotic disorder, 0.5% prevalence in the general population

  • defined in terms of particular symptoms (delusions, hallucinations)
  • but many more: worry, anxiety, depression
  • search for physical causes of little success, e.g. 200 genes with small effects and unclear interactions, and neurophysiological abnormalities not consistently identified

Alternative explanations in the aetiology of Schizophrenia

  • Social causes, e.g. stressful experiences, traumas like sexual abuse and bullying
  • Interactions between symptoms

Bullying - a damaging experience

  • Effects likely to operate through cognitive-emotional biases (with lowered mood):

    • increased self-focus,
    • often catastrophic reduction in self-regard,
    • anticipation of further episodes,
    • negative interpretation of ambiguous events
  • Characterised by

    • abuse, intrusiveness, threat and the actuality of arm
    • exaggeration and distortion of power relationships
    • short and long term consequences
  • Commonly leads to

    • mood disorders and suicidal ideation
    • psychotic symptoms and disorders, particularly persecutory ideation

Bullying - research question


Is it possible that the cognitive and emotional consequences of bullying are responsible for the psychotic manifestations that are associated with it?


Focus of the study:

  • evaluate the link between
    • a history of being bullied
    • mood symptoms
    • psychotic symptoms (persecutory ideation, hallucinations)
  • quantify potential intervention effects on persecutory ideation

A case study in Psychosis - the data

Data from the English National Survey of Psychiatric Morbidity, 2007 and 2000

Psychological questionnaire

  • symptomatic and experiential variables
  • cross sectional
  • 8580 subjects in 2000 survey
  • 9 selected variables
  • Social variable: a history of bullying
  • Psychological/behavioural variables
    • Persecutory thinking
    • Auditory hallucinations
    • Mood instability
    • Depression
    • Anxiety
    • Worry
    • Sleep problems
    • Cannabis use (physical effect on emergence of psychotic symptoms?)

Binary data - BDe score

Bayesian Dirichlet equivalent (BDe) score Heckerman and Geiger, UAI 1995

Binary case for DAG \(G\)

  • node \(X\) with \(m\) parents \(\boldsymbol{Y}\)
  • each state of \(\boldsymbol{Y}\) has parameter \(\theta_{\boldsymbol{Y}}\)

\[P(X=1 \vert \boldsymbol{Y}) = \theta_{\boldsymbol{Y}}\]

  • beta prior on \(\theta\) with hyperparameter

\[\alpha=\beta=\frac{\chi}{2^m}\]

BDe metric is marginal likelihood \(P(D\vert G)\)

BDe score is posterior \(P(G\vert D)\)

Partition MCMC to sample \(\thicksim P(G\vert D)\)

sample of 50,000 DAGs

Bullying - a case study with Bayesian networks


  • Social variable history of bullying assumed antecedent

  • Interactional model of symptoms explored by means of Bayesian networks (represented by DAGs)

  • double arrows imply equivalence classes

  • colour intensity reflects the strength of the links

  • For each graph and each variable derive potential intervention effect on downstream nodes (do(1) - do(0))
sample of 50,000 DAGs

Intervention effects: from a DAG ensemble from the posterior

  • posterior distribution of causal effects of row label on column label (downstream only)
  • 0 indicates no effect
  • truncated to (-.1, .5) for clarity
  • red line \(\rightarrow\) zero causal effect
  • box coloured if 95% credible interval does not straddle the zero line
  • numerical values \(\rightarrow\) posterior mean of the causal effect

Moffa et al, Schiz Bull 2017

Psychological significance of findings

  • Many hypothesised mediatiors did not meet the criteria for mediation: depression, anxiety, sleep disturbance, and hallucinations
  • Links between worry, mood instability and persecutory ideation could not be disambiguated, cannot be resolved from these data, hence no evidence that bullying leads to paranoia by disturbing the mood
  • In addition to highlight plausible causal links, the method also allows us to estimate the distributions of potential intervention effects
  • Studies underway currently involve attempts to alleviate persecutory ideation by reducing sleep disturbance and modifying depressive cognitions. Based on the present analysis they may prove unsuccessful

Significant limitation for psychology data
Inability to model feedback loops, partly adressed by Dynamic Bayesian networks

Dynamic Bayesian network - graph

2000 British National Psychiatric Morbidity survey and its 18-month follow-up data (N=2406)

  • one node for each variable at each time slice
  • assume stationarity over time
  • edges only displayed if they appear in at least 10% of the sampled DAGs

sample of 10,000 DAGs

Kuipers, Moffa et al, Psych Med 2018

Considerations about the psychological significance

  • Worry appears to have a central role in the links between symptoms;
    • with plausible direct effects on insomnia, depressed mood and generalised anxiety.
  • The relationship between persecutory ideation and worry is indeterminate, consistent with cross-sectional analysis
  • Not all variables appear self-predicting of their state at the second time point
    • interestingly these are made up of affective vairables (depression, social anxiety, and situational anxiety): a possibility is that they fluctuate significantly over the 18 months of follow up
    • general anxiety, worry, sleep problems, and persecutory ideation are strongly selfpredicting, suggesting they tend to persist over the follow-up period
  • The relationship over the 18-month follow-up period between persecutory ideation and worry is suggestive of a putative feedback loop

Thank you



Essential references

  • Moffa, Giusi, et al. “Using directed acyclic graphs in epidemiological research in psychosis: an analysis of the role of bullying in psychosis.” Schizophrenia Bulletin 43.6 (2017): 1273-1279.
  • Jack Kuipers*, Giusi Moffa*, Elizabeth Kuipers, Daniel Freeman and Paul Bebbington. “Links between psychotic and neurotic symptoms in the general population: an analysis of longitudinal British National Survey data using Directed Acyclic Graphs.” Psychological Medicine (2018): 1-8.
  • Kuipers, Jack, and Giusi Moffa. “Partition MCMC for inference on acyclic digraphs.” Journal of the American Statistical Association 112.517 (2017): 282-299.
  • Kuipers, Jack, Polina Suter, and Giusi Moffa. “Efficient Structure Learning and Sampling of Bayesian Networks.” arXiv preprint arXiv:1803.07859 (2018).
  • Kuipers, Jack, Thomas Thurnherr, Giusi Moffa, et al. “Mutational interactions define novel cancer subgroups.” Nature Communications 9, (2018).
  • Dawid, A. Philip. “Beware of the DAG!.” Causality: Objectives and Assessment. 2010.
  • Pearl, Judea. Causality. Cambridge university press, 2009.

Context

In the ocean of

  • data science
  • big data
  • machine learning
  • artificial intelligence
  • …and any other catchy
    buzz words of the moment…

is there still a place for statistics…or has it come to an end?, given that

10 years have gone by, since when Google’s Chief Economist Hal Varian declared

“I keep saying the sexy job in the next ten years will be statisticians”

Hal Varian, The McKinsey Quarterly, January 2009
Google’s Chief Economist Hal Varian on Statistics and Data
For Today’s Graduate, Just One Word: Statistics

Is stats doomed to ML? Adding perspective…

Daniela Witten on twitter in 2019

AI_ML_LR

Not to forget: the no free lunch principle

Big data is not everywhere…

One example of no evidence of striking performance advantage
Clinical prediction models

systematicReviewMLvsLR

Is ML going to replace stats in clinical trials?

It is the duty of any scientist to make a clear distinction between hype and reality.

Public money spending should be guided by sound ethical principles.

Pearls from the (statistical) community

Bradley Efron? in 2006

Those who ignore statistics are condemned to reinvent it. Statistics is the science of learning from experience

Karl Broman in 2013 (!)

If you’re analyzing data, you’re doing statistics. You can call it data science or informatics or analytics or whatever, but it’s still statistics. Data science is statistics

 

Making a case for statistics in the context of current hypes is not an easy stroll…

doNotCallItStatistics

Any strategy needs a vision

  • What is the vision of the department with respect to statistics?
  • What prevented to establish a mathematical statistics chair before?
  • What is the underlying objective set to achieve by establishing a chair in statistics?
  • How does it relate to a business case or strategic argument at the faculty and more broadly at the university level?
  • What does it add with respect to more ML/AI focused groups?

A mild attempt at a vision

  • It is a given that statistics is present in different shapes and forms in many applied departments across the university (economics, psychology, pharmacy, biology, medicine, …)
  • Bringing the users of statistics together around one table, to a discussion forum which can complement the subject matter expertise with the technical expertise of a methodologically focused group is essential to foster interdisciplinary work;
  • it seems more important than ever to enable quality data driven research, which is founded on principled approaches, and ultimately further strengthen the university wide contribution to scientific progress, with even higher levels of excellence.
  • ML may (and should) certainly complement, but not replace statistics. With more data the need for rigorous methodologically founded (statistical) modelling is expected to go up and not down…

What could be a sensible strategy?

  • Is there a case for mathematical statistics? Is statistics a purely theoretical subject, and to whom would that be attractive? Insisting on its theoretical side to highlight the difference from data analytics seems more like a dead end (IMHO)…

  • The added value should rather be seen in providing rigorous methodological foundations for analytical approaches and prediction algorithms, without severing the link between theory and application

  • The focus should be on building interpretable models with translational value

  • Statistics is by its own essence an interdisciplinary field, citing famous statistician John Tukey

‘’The best thing about being a statistician, is that you get to play in everyone’s backyard.’’

  • To make the game worthwhile for all players (in the backyard) we need to share an interest in developing application driven methods, so on applied rather than theoretical statistics

How statistics relates to computer science

A short parable by statistician Larry Wasserman in 2013 (!)

A scientist comes to a statistician with a question. The statistician responds by learning the scientific background behind the question. Eventually, after much thinking and investigation, the statistician produces a thoughtful answer. The answer is not just an answer but an answer with a standard error. And the standard error is often much larger than the scientist would like.

The scientist goes to a computer scientist. A few days later the computer scientist comes back with spectacular graphs and fast software.

Who would you go to?

How can statisticians get out from what may seem like an unfavourable position while at the same time saving the necessary rigour?

Marry or part ways?

Shoudn’t potential solutions look into
marrying rather than divorcing statistics from computer science and machine learning..?

A rigorous theoretical background is of course a condition sine qua non for well founded statistical work.

However, Wasserman also concludes (in 2013!) that we need to make sure that statistics students are competitive, so they need to be able to do serious computing, which means they need to understand data structures, distributed computing and multiple programming languages.

Trevor Hastie, Robert Tibshirani and Jerome Friedman’s
The Elements of statistical learning,
is an all time favourite (or a must?) on any reading list for foundations of Machine Learning

Science is still in need of the rigour of statistics…but statisticians should embrace innovation and adopt the tools of the 3rd millenium.

ElementsOfStatisticalLearning

It would be a loss to science to dismiss statistics as a relic of the past

Water, water, every where, Nor any drop to drink

The Rime of the Ancient Mariner, Samuel Taylor Coleridge, 1798

The rise and fall of statistics… It used to be Lies, damned lies and… statistics.

Should we change it to Lies, damned lies and… big data? But…

“…at the heart of extracting value from big data lies statistics.” (David Hand, 2014)

Data, data, everywhere, but let’s just stop and think
(David Hand @ https://www.statslife.org.uk/features/1166-data-data-everywhere-but-let-s-just-stop-and-think)

Appendix - Definition of statistics

Cambridge dictionary

the science of using information discovered from collecting, organizing, and studying numbers

Wikipedia

Statistics is the discipline that concerns the collection, organization, displaying, analysis, interpretation and presentation of data.

Oxford Learner’s Dictionaries

the science of collecting and analysing statistics